193 research outputs found
Hierarchy Composition GAN for High-fidelity Image Synthesis
Despite the rapid progress of generative adversarial networks (GANs) in image
synthesis in recent years, the existing image synthesis approaches work in
either geometry domain or appearance domain alone which often introduces
various synthesis artifacts. This paper presents an innovative Hierarchical
Composition GAN (HIC-GAN) that incorporates image synthesis in geometry and
appearance domains into an end-to-end trainable network and achieves superior
synthesis realism in both domains simultaneously. We design an innovative
hierarchical composition mechanism that is capable of learning realistic
composition geometry and handling occlusions while multiple foreground objects
are involved in image composition. In addition, we introduce a novel attention
mask mechanism that guides to adapt the appearance of foreground objects which
also helps to provide better training reference for learning in geometry
domain. Extensive experiments on scene text image synthesis, portrait editing
and indoor rendering tasks show that the proposed HIC-GAN achieves superior
synthesis performance qualitatively and quantitatively.Comment: 11 pages, 8 figure
The rectification and recognition of document images with perspective and geometric distortions
Ph.DDOCTOR OF PHILOSOPH
Scene Text Synthesis for Efficient and Effective Deep Network Training
A large amount of annotated training images is critical for training accurate
and robust deep network models but the collection of a large amount of
annotated training images is often time-consuming and costly. Image synthesis
alleviates this constraint by generating annotated training images
automatically by machines which has attracted increasing interest in the recent
deep learning research. We develop an innovative image synthesis technique that
composes annotated training images by realistically embedding foreground
objects of interest (OOI) into background images. The proposed technique
consists of two key components that in principle boost the usefulness of the
synthesized images in deep network training. The first is context-aware
semantic coherence which ensures that the OOI are placed around semantically
coherent regions within the background image. The second is harmonious
appearance adaptation which ensures that the embedded OOI are agreeable to the
surrounding background from both geometry alignment and appearance realism. The
proposed technique has been evaluated over two related but very different
computer vision challenges, namely, scene text detection and scene text
recognition. Experiments over a number of public datasets demonstrate the
effectiveness of our proposed image synthesis technique - the use of our
synthesized images in deep network training is capable of achieving similar or
even better scene text detection and scene text recognition performance as
compared with using real images.Comment: 8 pages, 5 figure
- …